OcrV1, Main, Exploration, bibRecord, 000195

High Speed and High Accuracy Pre-Classification Method for OCR: Margin Added Hashing

Identifieur interne : 000195 ( Main/Exploration ); précédent : 000194; suivant : 000196

High Speed and High Accuracy Pre-Classification Method for OCR: Margin Added Hashing

Auteurs : Yutaka Katsuyama [Japon] ; Yoshinobu Hotta [Japon] ; Masako Omachi [Japon] ; Shinichiro Omachi [Japon]

Source :

IEICE transactions on information and systems [ 0916-8532 ] ; 2013.

RBID : Pascal:13-0301010

Descripteurs français

Pascal (Inist)
- Précision élevée, Reconnaissance optique caractère, Hachage, Complexité temps, Japonais, Temps traitement, Haute performance, Modulation amplitude, Redondance, Evaluation performance, Apprentissage, Agrégation, Classification automatique, Réseau neuronal, Appareillage essai, Reconnaissance forme, Classification signal.

English descriptors

KwdEn :
- Aggregation, Amplitude modulation, Automatic classification, Hashing, High performance, High precision, Japanese, Learning, Neural network, Optical character recognition, Pattern recognition, Performance evaluation, Processing time, Redundancy, Signal classification, Testing equipment, Time complexity.

Abstract

Reducing the time complexity of character matching is critical to the development of efficient Japanese Optical Character Recognition (OCR) systems. To shorten the processing time, recognition is usually split into separate pre-classification and precise recognition stages. For high overall recognition performance, the pre-classification stage must both have very high classification accuracy and return only a small number of putative character categories for further processing. Furthermore, for any practical system, the speed of the pre-classification stage is also critical. The associative matching (AM) method has often been used for fast pre-classification because of its use of a hash table and reliance on just logical bit operations to select categories, both of which make it highly efficient. However, a certain level of redundancy exists in the hash table because it is constructed using only the minimum and maximum values of the data on each axis and therefore does not take account of the distribution of the data. We propose a novel method based on the AM method that satisfies the performance criteria described above but in a fraction of the time by modifying the hash table to reduce the range of each category of training characters. Furthermore, we show that our approach outperforms pre-classification by VQ clustering, ANN, LSH and AM in terms of classification accuracy, reducing the number of candidate categories and total processing time across an evaluation test set comprising 116,528 Japanese character images.

Affiliations:

Links toward previous steps (curation, corpus...)

to stream PascalFrancis, to step Corpus: 000046
to stream PascalFrancis, to step Curation: 000722
to stream PascalFrancis, to step Checkpoint: 000038
to stream Main, to step Merge: 000198
to stream Main, to step Curation: 000195

Le document en format XML

<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en" level="a">High Speed and High Accuracy Pre-Classification Method for OCR: Margin Added Hashing</title>
<author><name sortKey="Katsuyama, Yutaka" sort="Katsuyama, Yutaka" uniqKey="Katsuyama Y" first="Yutaka" last="Katsuyama">Yutaka Katsuyama</name>
<affiliation wicri:level="1"><inist:fA14 i1="01"><s1>FUJITSU LABORATORIES LTD.</s1>
<s2>Kawasaki-shi, 211-8588</s2>
<s3>JPN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>Japon</country>
<wicri:noRegion>FUJITSU LABORATORIES LTD.</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Hotta, Yoshinobu" sort="Hotta, Yoshinobu" uniqKey="Hotta Y" first="Yoshinobu" last="Hotta">Yoshinobu Hotta</name>
<affiliation wicri:level="1"><inist:fA14 i1="01"><s1>FUJITSU LABORATORIES LTD.</s1>
<s2>Kawasaki-shi, 211-8588</s2>
<s3>JPN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>Japon</country>
<wicri:noRegion>FUJITSU LABORATORIES LTD.</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Omachi, Masako" sort="Omachi, Masako" uniqKey="Omachi M" first="Masako" last="Omachi">Masako Omachi</name>
<affiliation wicri:level="1"><inist:fA14 i1="02"><s1>Sendai National College of Technology</s1>
<s2>Natori-shi, 981-1239</s2>
<s3>JPN</s3>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>Japon</country>
<wicri:noRegion>Sendai National College of Technology</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Omachi, Shinichiro" sort="Omachi, Shinichiro" uniqKey="Omachi S" first="Shinichiro" last="Omachi">Shinichiro Omachi</name>
<affiliation wicri:level="4"><inist:fA14 i1="03"><s1>Tohoku University</s1>
<s2>Sendai-shi, 980-8579</s2>
<s3>JPN</s3>
<sZ>4 aut.</sZ>
</inist:fA14>
<country>Japon</country>
<placeName><settlement type="city">Sendai</settlement>
<region type="province">Région de Tōhoku</region>
</placeName>
<orgName type="university">Université du Tōhoku</orgName>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">INIST</idno>
<idno type="inist">13-0301010</idno>
<date when="2013">2013</date>
<idno type="stanalyst">PASCAL 13-0301010 INIST</idno>
<idno type="RBID">Pascal:13-0301010</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000046</idno>
<idno type="wicri:Area/PascalFrancis/Curation">000722</idno>
<idno type="wicri:Area/PascalFrancis/Checkpoint">000038</idno>
<idno type="wicri:doubleKey">0916-8532:2013:Katsuyama Y:high:speed:and</idno>
<idno type="wicri:Area/Main/Merge">000198</idno>
<idno type="wicri:Area/Main/Curation">000195</idno>
<idno type="wicri:Area/Main/Exploration">000195</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en" level="a">High Speed and High Accuracy Pre-Classification Method for OCR: Margin Added Hashing</title>
<author><name sortKey="Katsuyama, Yutaka" sort="Katsuyama, Yutaka" uniqKey="Katsuyama Y" first="Yutaka" last="Katsuyama">Yutaka Katsuyama</name>
<affiliation wicri:level="1"><inist:fA14 i1="01"><s1>FUJITSU LABORATORIES LTD.</s1>
<s2>Kawasaki-shi, 211-8588</s2>
<s3>JPN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>Japon</country>
<wicri:noRegion>FUJITSU LABORATORIES LTD.</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Hotta, Yoshinobu" sort="Hotta, Yoshinobu" uniqKey="Hotta Y" first="Yoshinobu" last="Hotta">Yoshinobu Hotta</name>
<affiliation wicri:level="1"><inist:fA14 i1="01"><s1>FUJITSU LABORATORIES LTD.</s1>
<s2>Kawasaki-shi, 211-8588</s2>
<s3>JPN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>Japon</country>
<wicri:noRegion>FUJITSU LABORATORIES LTD.</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Omachi, Masako" sort="Omachi, Masako" uniqKey="Omachi M" first="Masako" last="Omachi">Masako Omachi</name>
<affiliation wicri:level="1"><inist:fA14 i1="02"><s1>Sendai National College of Technology</s1>
<s2>Natori-shi, 981-1239</s2>
<s3>JPN</s3>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>Japon</country>
<wicri:noRegion>Sendai National College of Technology</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Omachi, Shinichiro" sort="Omachi, Shinichiro" uniqKey="Omachi S" first="Shinichiro" last="Omachi">Shinichiro Omachi</name>
<affiliation wicri:level="4"><inist:fA14 i1="03"><s1>Tohoku University</s1>
<s2>Sendai-shi, 980-8579</s2>
<s3>JPN</s3>
<sZ>4 aut.</sZ>
</inist:fA14>
<country>Japon</country>
<placeName><settlement type="city">Sendai</settlement>
<region type="province">Région de Tōhoku</region>
</placeName>
<orgName type="university">Université du Tōhoku</orgName>
</affiliation>
</author>
</analytic>
<series><title level="j" type="main">IEICE transactions on information and systems</title>
<title level="j" type="abbreviated">IEICE trans. inf. syst.</title>
<idno type="ISSN">0916-8532</idno>
<imprint><date when="2013">2013</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt><title level="j" type="main">IEICE transactions on information and systems</title>
<title level="j" type="abbreviated">IEICE trans. inf. syst.</title>
<idno type="ISSN">0916-8532</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass><keywords scheme="KwdEn" xml:lang="en"><term>Aggregation</term>
<term>Amplitude modulation</term>
<term>Automatic classification</term>
<term>Hashing</term>
<term>High performance</term>
<term>High precision</term>
<term>Japanese</term>
<term>Learning</term>
<term>Neural network</term>
<term>Optical character recognition</term>
<term>Pattern recognition</term>
<term>Performance evaluation</term>
<term>Processing time</term>
<term>Redundancy</term>
<term>Signal classification</term>
<term>Testing equipment</term>
<term>Time complexity</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr"><term>Précision élevée</term>
<term>Reconnaissance optique caractère</term>
<term>Hachage</term>
<term>Complexité temps</term>
<term>Japonais</term>
<term>Temps traitement</term>
<term>Haute performance</term>
<term>Modulation amplitude</term>
<term>Redondance</term>
<term>Evaluation performance</term>
<term>Apprentissage</term>
<term>Agrégation</term>
<term>Classification automatique</term>
<term>Réseau neuronal</term>
<term>Appareillage essai</term>
<term>Reconnaissance forme</term>
<term>Classification signal</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">Reducing the time complexity of character matching is critical to the development of efficient Japanese Optical Character Recognition (OCR) systems. To shorten the processing time, recognition is usually split into separate pre-classification and precise recognition stages. For high overall recognition performance, the pre-classification stage must both have very high classification accuracy and return only a small number of putative character categories for further processing. Furthermore, for any practical system, the speed of the pre-classification stage is also critical. The associative matching (AM) method has often been used for fast pre-classification because of its use of a hash table and reliance on just logical bit operations to select categories, both of which make it highly efficient. However, a certain level of redundancy exists in the hash table because it is constructed using only the minimum and maximum values of the data on each axis and therefore does not take account of the distribution of the data. We propose a novel method based on the AM method that satisfies the performance criteria described above but in a fraction of the time by modifying the hash table to reduce the range of each category of training characters. Furthermore, we show that our approach outperforms pre-classification by VQ clustering, ANN, LSH and AM in terms of classification accuracy, reducing the number of candidate categories and total processing time across an evaluation test set comprising 116,528 Japanese character images.</div>
</front>
</TEI>
<affiliations><list><country><li>Japon</li>
</country>
<region><li>Région de Tōhoku</li>
</region>
<settlement><li>Sendai</li>
</settlement>
<orgName><li>Université du Tōhoku</li>
</orgName>
</list>
<tree><country name="Japon"><noRegion><name sortKey="Katsuyama, Yutaka" sort="Katsuyama, Yutaka" uniqKey="Katsuyama Y" first="Yutaka" last="Katsuyama">Yutaka Katsuyama</name>
</noRegion>
<name sortKey="Hotta, Yoshinobu" sort="Hotta, Yoshinobu" uniqKey="Hotta Y" first="Yoshinobu" last="Hotta">Yoshinobu Hotta</name>
<name sortKey="Omachi, Masako" sort="Omachi, Masako" uniqKey="Omachi M" first="Masako" last="Omachi">Masako Omachi</name>
<name sortKey="Omachi, Shinichiro" sort="Omachi, Shinichiro" uniqKey="Omachi S" first="Shinichiro" last="Omachi">Shinichiro Omachi</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration

HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000195 | SxmlIndent | more

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000195 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     Pascal:13-0301010
   |texte=   High Speed and High Accuracy Pre-Classification Method for OCR: Margin Added Hashing
}}

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024

	Serveur d'exploration sur l'OCR
	Attention, ce site est en cours de développement ! Attention, site généré par des moyens informatiques à partir de corpus bruts. Les informations ne sont donc pas validées.

Serveur d'exploration sur l'OCR

High Speed and High Accuracy Pre-Classification Method for OCR: Margin Added Hashing

High Speed and High Accuracy Pre-Classification Method for OCR: Margin Added Hashing

Source :

Descripteurs français

English descriptors

Abstract

Links toward previous steps (curation, corpus...)

Le document en format XML

Pour manipuler ce document sous Unix (Dilib)

Pour mettre un lien sur cette page dans le réseau Wicri